AI-driven translations for kidney transplant equity in Hispanic populations

Health equity and accessing Spanish kidney transplant information continues being a substantial challenge facing the Hispanic community. This study evaluated ChatGPT’s capabilities in translating 54 English kidney transplant frequently asked questions (FAQs) into Spanish using two versions of the AI model, GPT-3.5 and GPT-4.0. The FAQs included 19 from Organ Procurement and Transplantation Network (OPTN), 15 from National Health Service (NHS), and 20 from National Kidney Foundation (NKF). Two native Spanish-speaking nephrologists, both of whom are of Mexican heritage, scored the translations for linguistic accuracy and cultural sensitivity tailored to Hispanics using a 1–5 rubric. The inter-rater reliability of the evaluators, measured by Cohen’s Kappa, was 0.85. Overall linguistic accuracy was 4.89 ± 0.31 for GPT-3.5 versus 4.94 ± 0.23 for GPT-4.0 (non-significant p = 0.23). Both versions scored 4.96 ± 0.19 in cultural sensitivity (p = 1.00). By source, GPT-3.5 linguistic accuracy was 4.84 ± 0.37 (OPTN), 4.93 ± 0.26 (NHS), 4.90 ± 0.31 (NKF). GPT-4.0 scored 4.95 ± 0.23 (OPTN), 4.93 ± 0.26 (NHS), 4.95 ± 0.22 (NKF). For cultural sensitivity, GPT-3.5 scored 4.95 ± 0.23 (OPTN), 4.93 ± 0.26 (NHS), 5.00 ± 0.00 (NKF), while GPT-4.0 scored 5.00 ± 0.00 (OPTN), 5.00 ± 0.00 (NHS), 4.90 ± 0.31 (NKF). These high linguistic and cultural sensitivity scores demonstrate Chat GPT effectively translated the English FAQs into Spanish across systems. The findings suggest Chat GPT’s potential to promote health equity by improving Spanish access to essential kidney transplant information. Additional research should evaluate its medical translation capabilities across diverse contexts/languages. These English-to-Spanish translations may increase access to vital transplant information for underserved Spanish-speaking Hispanic patients.

The concept of health equity involves providing every individual with a fair and just opportunity to attain their highest level of health 1 .Unfortunately, disparities in healthcare access and the distribution of medical information continue to be significant barriers 2 .For the Hispanic community, particularly those who primarily speak Spanish, these barriers are often compounded by linguistic challenges, limiting their access to essential healthcare information [3][4][5][6] .A recent study examined trends in poor health indicators among Black and Hispanic middle-aged and older adults in the United States from 1999 to 2018 7 .The study found that, while Hispanics showed overall improvements in physical inactivity and perceived poor health, they experienced deterioration in hypertension and diabetes rates.Notably, the study reported no significant change in the Hispanic-White gap for kidney disease over the 20-year period, indicating that the disparity in this specific condition did not improve 7 .In the context of kidney transplantation, where understanding complex medical information is crucial, this language barrier presents a substantial obstacle 8,9 .The Hispanic population is disproportionately affected by kidney diseases, including higher prevalence rates of conditions leading to kidney failure.According to epidemiological studies, Hispanics are more likely to develop end-stage kidney disease (ESKD) compared to non-Hispanic whites [10][11][12] .
Additionally, they face longer waiting times for kidney transplants and lower rates of referral for transplant evaluations 9,13 .These disparities can be attributed to several factors, including language barriers that impede www.nature.com/scientificreports/effective communication between healthcare providers and patients, leading to misunderstandings, missed appointments, and incomplete or inaccurate medical documentation.Moreover, the lack of culturally and linguistically appropriate health information contributes to a lower level of health literacy among this population, further complicating their navigation through the transplant referral and evaluation process [3][4][5][6] .
The provision of culturally appropriate health information is crucial in managing and treating chronic conditions like kidney disease.Culturally sensitive information takes into account not just the language but also the cultural beliefs, practices, and values of a community [14][15][16] .This approach is particularly important in the Hispanic community, where cultural nuances play a vital role in health-related decision-making.Additionally, Language barriers can significantly impact the quality of healthcare received by non-English speaking patients 17,18 .In the United States, a considerable portion of the Hispanic population has limited English proficiency, making it challenging for them to access and understand health information in English 18,19 .This gap is not just a matter of translation but involves conveying complex medical concepts in a linguistically and culturally appropriate manner.
Artificial intelligence, particularly advanced language models like Chat GPT 3.5 and 4.0, presents an innovative approach to addressing the challenges of language barriers in healthcare [20][21][22][23][24][25][26][27][28] .These AI models hold the potential to accurately and sensitively translate complex medical information, thereby making it accessible to a wider audience [29][30][31][32][33][34][35][36] .In the specific context of kidney transplantation, where the necessity for detailed and accurate information is critical, the role of AI-driven translations could be transformative, offering a significant advancement in how medical information is communicated to non-English speaking populations.
The primary objective of this study is to evaluate the effectiveness of Chat GPT 3.5 and 4.0 in translating kidney transplantation-related FAQs from English to Spanish, tailored for the Hispanic community.The study focuses on the accuracy and cultural sensitivity of these translations, assessing whether these AI tools can provide reliable, comprehensible, and culturally appropriate medical information.By doing so, the study seeks to determine the potential of AI in improving health information accessibility and contributing to health equity for Spanish-speaking Hispanics.

Data collection
This study was conducted to perform English-to-Spanish translation of 54 frequently asked questions (FAQs) regarding kidney transplantation.The FAQs were selected to comprehensively represent the relevant topics to patients considering or undergoing kidney transplantations.The FAQs was obtained from (1) Organ Procurement and Transplantation Network (OPTN) 37 ; 19 questions focusing on eligibility criteria, waitlist process, and post-transplant care, (2) National Health Service (NHS) 38 ; 15 questions focusing on patient preparation for kidney transplantation, surgical procedures, and post-transplant care, (3) National Kidney Foundation 39 ; 20 questions focusing on long-term management, lifestyle consideration, and support resources for kidney transplant recipients (Online supplementary data).This study is exempt from Ethics Committee or Institutional Review Board approval, as it neither involves human nor animal subjects, nor does it encompass patient information or identifiable personal data.

AI language model usage
The translation process utilized ChatGPT versions 3.5 and 4.0 40 .These AI chatbots were chosen for their advanced natural language processing capabilities 41,42 , which include the ability to understand context, generate coherent and contextually appropriate text, and maintain consistency in translations.Each selected FAQ was input into the AI chatbot in its English version, and the models then provided Spanish translations.This process was conducted individually for each question to ensure that each translation was contextually accurate.The AI chatbots were configured to optimize for translation accuracy and cultural relevance, focusing on nuances that would make the translations suitable for the Hispanic community.The study was conducted in December 2023.

Systematic evaluation of translations
Each translation was evaluated using a detailed rubric scale ranging from 1 to 5 (Online supplementary data), when 1 represents a lower or poor performance and 5 indicates a higher or excellent performance 43 .The rubric scale was designed to assess two key aspects: • Linguistic accuracy: This criterion evaluated the grammatical correctness, appropriate use of vocabulary, and syntactic integrity of the translations.Translations were examined for their clarity, readability, and technical precision in medical terminology.• Cultural sensitivity: This measure assessed the extent to which translations respected and incorporated cul- tural nuances, idiomatic expressions, and contextually relevant information for the Hispanic community.This aspect was crucial to ensure that the translations were not only linguistically accurate but also culturally resonant and sensitive to the needs of the target audience.
Two nephrologists of Mexican heritage, fluent in Spanish, O.A.G.V. and M.G.S., meticulously evaluated the translations for accuracy and cultural relevance using a 1-5 scale.The evaluation process totaled approximately 40 h, with each expert contributing around 20 h.They began with O.A.G.V. 's initial assessments, which M.G.S. reviewed and confirmed, and any differences were resolved through consensus.The inter-rater reliability of the evaluators, measured by Cohen's Kappa, was 0.85, indicating a high level of agreement and supporting the reliability and credibility of the findings.

Statistical analysis
The mean scores for linguistic accuracy and culture sensitivity were summarized as mean ± standard deviation (SD).The score was compared between GPT-3.5 and 4.0 using paired-t test.The score was compared across three question sources using analysis of variance (ANOVA) test.The two-tailed p-value less than 0.05 was considered statistically significant.Statistical analyses were performed using JMP statistical software (version 17, SAS Institute, Cary, NC).

Results
The score for linguistic accuracy and cultural sensitivity of GPT-3.5 and GPT-4.0 for individual FAQs were shown in Table S1.The mean linguistic accuracy score was 4.89 ± 0.31 for GPT-3.5 and 4.94 ± 0.23.There was no significant difference in mean linguistic accuracy score between GPT-3.5 and 4.0 in all questions (p = 0.26) as well as when stratified by FAQ sources.The mean linguistic accuracy score was comparable across three FAQ sources for GPT-3.5 (4.84 ± 0.37 vs. 4.93 ± 0.26 vs. 4.90 ± 0.31 for FAQs from OPTN, NHS, and NKF respectively; p = 0.70) and GPT-4.0 (4.95 ± 0.23 vs. 4.93 ± 0.26 vs. 4.95 ± 0.22 for FAQs from OPTN, NHS, and NKF respectively; p = 0.98) (Table 1).

Discussion
The study meticulously evaluated the translation capabilities of ChatGPT 3.5 and 4.0, focusing on translating kidney transplantation FAQs for the Hispanic community.The main results indicate that both versions achieved high levels of accuracy and cultural sensitivity, with ChatGPT 4.0 slightly outperforming 3.5 in terms of accuracy.Specifically, ChatGPT 3.5 demonstrated exceptional cultural sensitivity, especially in the NKF subgroup, while ChatGPT 4.0 consistently scored perfect accuracy across all questions.The study's results are especially significant in the context of health equity.By offering accurate and culturally sensitive translations, AI models like ChatGPT can play a crucial role in leveling the informational playing field for non-English-speaking communities.This is particularly important for Hispanics affected by kidney diseases, who often encounter linguistic hurdles in accessing vital health information [44][45][46] .The ability of ChatGPT to provide translations that are not only linguistically accurate but also culturally resonant is key to its effectiveness as a tool for disseminating medical information.
While both versions demonstrated high accuracy and cultural sensitivity, it is noteworthy that ChatGPT 3.5 had occasional lower scores in either accuracy or cultural sensitivity in specific questions.This suggests that while the model is highly effective, there is room for improvement, particularly in handling certain nuances that require deeper cultural understanding.In contrast, ChatGPT 4.0's consistent scoring of 5 in accuracy for all questions reflects advancements in AI technology, although it too faced challenges in cultural sensitivity in a few instances.The effectiveness of AI in translation is not solely dependent on linguistic accuracy but also on its ability to resonate culturally with the intended audience.This is particularly crucial in healthcare, where the cultural context can significantly impact how information is received and acted upon [47][48][49] .
Comparing this study's findings with previous research in AI-driven language translation in healthcare, it's evident that there have been significant advancements [50][51][52][53][54] .Earlier research often pointed out the limitations of AI models in grasping the complexities of language and cultural context, particularly in medical translations where both accuracy and sensitivity are crucial.These models typically struggled to maintain a balance between literal accuracy and the deeper layers of cultural context, resulting in translations that were technically correct but often lacked relevance and appropriateness in a real-world setting.In contrast, this study highlights a significant progress with ChatGPT 3.5 and 4.0, illustrating their improved ability to not only translate complex medical information accurately but also to consider cultural appropriateness in these translations.This progress signifies a move towards more sophisticated AI models that are linguistically adept and more in tune with the cultural and contextual aspects of language, meeting the practical needs of diverse patient groups, like those seeking information on kidney transplantation.
Table 1.The mean score for linguistic accuracy and culture sensitivity of GPT-3.5 and 4.0 *p-value between GPT-3.5 and 4.0.# p-value between question source.The study, while pivotal in evaluating the translation capabilities of ChatGPT 3.5 and 4.0 for kidney transplantation FAQs in Spanish for the Hispanic community, presents certain limitations that shape the scope and applicability of its findings.Its focus is narrowly tailored to a specific medical context and a particular linguistic group, which may not encompass the varied complexities of other medical domains or cater to different cultural backgrounds.The reliance on human evaluators introduces an element of subjectivity in assessing translation accuracy and cultural sensitivity, potentially affecting the consistency of the results.Furthermore, the study's constraint to only two AI models limits a broader comparative analysis across the spectrum of available AI translation technologies.Future research, therefore, should aim to broaden the scope to include diverse medical topics and languages, extend evaluations to a wider range of AI models, and incorporate more objective assessment methods.Such expansion and refinement in research approach would not only enhance the generalizability of the findings but also deepen the understanding of AI's potential in overcoming language barriers in global healthcare contexts.
In addition, in future research, the exploration of AI-driven translation tools like ChatGPT 3.5 and 4.0 in real clinical practice represents a critical area for advancement, especially in the context of kidney transplantation and health equity.These studies should focus on evaluating the impact of AI translations on patient outcomes, understanding, and engagement in their healthcare journey.Integration with healthcare systems, including electronic health records and patient portals, is also essential to assess the efficiency and effectiveness of AI tools in a clinical setting.Feedback from healthcare providers will be invaluable, offering insights into the practical utility, accuracy, and cultural appropriateness of these translations in enhancing patient care.Additionally, longitudinal studies observing the long-term effects of AI translation tools, their cost-effectiveness, and comparative analyses

Figure 1 .
Figure 1.Comparative analysis of average accuracy and cultural sensitivity in AI-generated translations of kidney transplant information.Top panel: (Left) GPT 3.5: average accuracy across different organizations (OPTN, NHS, NKF) and overall score.(Right) GPT 3.5: average cultural sensitivity across different organizations (OPTN, NHS, NKF) and overall score.Bottom panel: (Left) GPT 4.0: average accuracy across different organizations (OPTN, NHS, NKF) and overall score.(Right) GPT 4.0: average cultural sensitivity across different organizations (OPTN, NHS, NKF) and overall score.